-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EFM] Recoverable Random Beacon State Machine #6771
base: feature/efm-recovery
Are you sure you want to change the base?
[EFM] Recoverable Random Beacon State Machine #6771
Conversation
…nd state but rather current
…internals of state machine
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/efm-recovery #6771 +/- ##
========================================================
+ Coverage 41.71% 43.40% +1.69%
========================================================
Files 2030 1529 -501
Lines 180459 141392 -39067
========================================================
- Hits 75285 61378 -13907
+ Misses 98978 75015 -23963
+ Partials 6196 4999 -1197
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…m-beacon-state-machine
@@ -154,24 +154,24 @@ func (e *ReactorEngine) startDKGForEpoch(currentEpochCounter uint64, first *flow | |||
Hex("first_block_id", firstID[:]). // id of first block in EpochSetup phase | |||
Logger() | |||
|
|||
// if we have started the dkg for this epoch already, exit | |||
// if we have dkgState the dkg for this epoch already, exit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// if we have dkgState the dkg for this epoch already, exit | |
// if we have started the dkg for this epoch already, exit |
There are a few other instances of this replacement in this file. Assuming it's a rename gone wrong.
@@ -427,10 +431,10 @@ func (e *ReactorEngine) end(nextEpochCounter uint64) func() error { | |||
// has already abandoned the happy path, because on the happy path the ReactorEngine is the only writer. | |||
// Then this function just stops and returns without error. | |||
e.log.Warn().Err(err).Msgf("node %s with index %d failed DKG locally", e.me.NodeID(), e.controller.GetIndex()) | |||
err := e.dkgState.SetDKGEndState(nextEpochCounter, flow.DKGEndStateDKGFailure) | |||
err := e.dkgState.SetDKGState(nextEpochCounter, flow.DKGStateFailure) | |||
if err != nil { | |||
if errors.Is(err, storage.ErrAlreadyExists) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetDKGState
doesn't return this error type, it returns InvalidTransitionRandomBeaconStateMachineErr
.
The comment above is outdated now, but it implies that we might already have persisted a failure state:
By convention, if we are leaving the happy path, we want to persist the first failure symptom
The state machine does not allow transitions from one state to itself (except for RandomBeaconKeyCommitted
). If, as the current comment suggests, a failure state is already set at this point, we will throw InvalidTransitionRandomBeaconStateMachineErr
as DKGStateFailure-> DKGStateFailure
is not a valid transition. I don't think this is the case, but I'll return to this after reading further through the PR.
Suggestions:
- update comment on lines 428-432
- remove
ErrAlreadyExists
check
// Despite the key origin this is a terminal state which defines a safe Random Beacon key for the next epoch and allow node | ||
// to participate in the Random Beacon protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Despite the key origin this is a terminal state which defines a safe Random Beacon key for the next epoch and allow node | |
// to participate in the Random Beacon protocol. | |
// Regardless of the key origin, this is a terminal state which defines a safe Random Beacon key for the next epoch and allows the node | |
// to participate in the Random Beacon protocol. |
flow.DKGStateStarted: {flow.DKGStateCompleted, flow.DKGStateFailure, flow.RandomBeaconKeyCommitted}, | ||
flow.DKGStateCompleted: {flow.RandomBeaconKeyCommitted, flow.DKGStateFailure, flow.RandomBeaconKeyCommitted}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flow.DKGStateStarted: {flow.DKGStateCompleted, flow.DKGStateFailure, flow.RandomBeaconKeyCommitted}, | |
flow.DKGStateCompleted: {flow.RandomBeaconKeyCommitted, flow.DKGStateFailure, flow.RandomBeaconKeyCommitted}, | |
flow.DKGStateStarted: {flow.DKGStateCompleted, flow.DKGStateFailure, flow.RandomBeaconKeyCommitted}, | |
flow.DKGStateCompleted: {flow.RandomBeaconKeyCommitted, flow.DKGStateFailure}, |
var allowedStateTransitions = map[flow.DKGState][]flow.DKGState{ | ||
flow.DKGStateStarted: {flow.DKGStateCompleted, flow.DKGStateFailure, flow.RandomBeaconKeyCommitted}, | ||
flow.DKGStateCompleted: {flow.RandomBeaconKeyCommitted, flow.DKGStateFailure, flow.RandomBeaconKeyCommitted}, | ||
flow.RandomBeaconKeyCommitted: {flow.RandomBeaconKeyCommitted}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we permit self-transitions for only this state?
RetrieveMyBeaconPrivateKey(epochCounter uint64) (key crypto.PrivateKey, safe bool, err error) | ||
} | ||
|
||
// DKGStateReader is a ready-only interface for reading state of the Random Beacon Recoverable State Machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// DKGStateReader is a ready-only interface for reading state of the Random Beacon Recoverable State Machine. | |
// DKGStateReader is a read-only interface for reading state of the Random Beacon Recoverable State Machine. |
// No errors are expected during normal operations. | ||
UpsertMyBeaconPrivateKey(epochCounter uint64, key crypto.PrivateKey) error | ||
} | ||
|
||
// InvalidTransitionRandomBeaconStateMachineErr is a sentinel error that is returned in case an invalid state transition is attempted. | ||
type InvalidTransitionRandomBeaconStateMachineErr struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type InvalidTransitionRandomBeaconStateMachineErr struct { | |
type InvalidDKGStateTransitionError struct { |
The states are called flow.DKGState
: I think we can use the more concise name here.
Minor nitpick on error naming convention: ErrXYZ
for sentinels, XYZError
for types.
err = store.SetDKGStarted(epochCounter) | ||
assert.NoError(t, err) | ||
t.Run("-> flow.DKGStateStarted, should be allowed", func(t *testing.T) { | ||
epochCounter++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
epochCounter++ |
I'm guessing this was a previous strategy to ensure that all sub-tests use a different epoch counter. But currently each sub-test is calling setupState()
, which uses a random epoch counter. Unless I'm missing something, I think we can remove.
}) | ||
|
||
t.Run("-> flow.DKGStateFailure, should be allowed", func(t *testing.T) { | ||
epochCounter++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
epochCounter++ |
epochCounter := setupState() | ||
|
||
actualState, err := store.GetDKGState(epochCounter) | ||
require.NoError(t, err, storage.ErrNotFound) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.NoError(t, err, storage.ErrNotFound) | |
require.NoError(t, err) |
#6725
Context
The goal of this PR is to implement a state machine which allows to cover all possible cases which can happen when performing DKG. Previously our implementation was enforcing some of the rules but not all of them, what made things worse is that logic and interfaces wasn't explicit enough so an engineer can easily understand it. In this PR we have made an attempt to implement a robust approach to handling possible state and state transitions for both happy path and recovery path.
Some key points:
badger.RecoverablePrivateBeaconKeyStateMachine
implements the state machine itself.dkg.ReactorEngine
anddkg.BeaconKeyRecovery
.EpochCommit
for respective epoch.Link to the diagram which describes the state machine: